knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(ggridges)
library(eliter)

Two mode decomposition

We need specific methods for two mode decomposition when the degree distribution differs substantially between the two modes in a network. In other words when boards and board members have different network dynamics and different levels of degrees.

Why minimal members decomposition?

We hope to develop a method that can work on datasets where the exact number of members in an affiliation is unknown - and the exact number of memberships also is unknown.

The method should return results that are relatively similar to the weighted core detection regime - but preferably also be better at the affiliation level. It is a goal that it can retain its two-mode nature.

Our usual weighting scheme for core detection - which relies on the size of affiliations and on the number of ties between nodes is challenged by the missing data - such as we find it in Who's Who data or other sources. Missing data also affects the calculation of the reach graph.

Definition

eliter::minimal.members.decomposition

Theoretical fit

Observations

No cut off!

The main issue is that we do not have a natural cut point. Presently that might still be possible. Most analysis have relatively few levels so choosing the right level of disaggregation is feasible if we have priors on the group size.

Speed

The method scales absurdly well and is lightning fast. The current implementation is not optimized but a simple laptop is able to run the algorithm on very large datasets within seconds.

The innermost level

The innermost level is a clique of individuals

The transposed method

Known issues

Nested affiliations will naturally have linking members equal to the size of the smallest affiliation. This issue is mainly prevalent in high data quality scenarios. In the who's who data people are unlikely to list both affiliations in a nested pair. The solution is to check for nestedness, data cleaning or perhaps to check for uniqueness. The simplest solution is to increase the minimum memberships to three, which still gives nested affilations a head start.

In ultra high data quality scenarios we don't know the number of memberships because of fake affiliations.

We can collapse affilations that have identical members. If done for each level it works somewhat like a betweenness decomposition.

We need to check for multiple memberships in the same affiliation

Simulating missing

Privacy missing: A complete biography is missing. A likely form of missing is an existing biography with no affiliations listed. Lets call these types of missing privacy missing. Here there is not discrimination - any memberships are left out.

Prominence missing: Less prominent positions are left out because of space considerations. A particularly active member leaves out positions in less prominent affiliations because otherwise his biography is going to be too long. It is likely to be somewhat unimportant affiliations that will be left out. But what is important to one person can be trivial for another. Affiliations that produce sector density are likely to be left out, while public commissions are likely to be registers.

Examples

Theoretical example

el            <- list()
el$david      <- tibble(NAME = "David", AFFILIATION = LETTERS[1:3])
el$boris      <- tibble(NAME = "Boris", AFFILIATION = LETTERS[c(1:5, 7)])
el$tony       <- tibble(NAME = "Tony",  AFFILIATION = LETTERS[4:6])
el$theresa    <- tibble(NAME = "Theresa", AFFILIATION = LETTERS[c(1,4, 6)])
el$gordon     <- tibble(NAME = "Gordon", AFFILIATION = LETTERS[c(1,2, 3, 6, 7)])

el$iso1     <- tibble(NAME = paste("iso", LETTERS[1], 1:5), AFFILIATION = LETTERS[1])
el$iso2     <- tibble(NAME = paste("iso", LETTERS[2], 1:5), AFFILIATION = LETTERS[2])
el$iso3     <- tibble(NAME = paste("iso", LETTERS[3], 1:4), AFFILIATION = LETTERS[3])
el$iso4     <- tibble(NAME = paste("iso", LETTERS[4], 1:4), AFFILIATION = LETTERS[4])
el$iso5     <- tibble(NAME = paste("iso", LETTERS[5], 1:4), AFFILIATION = LETTERS[5])
el$iso6     <- tibble(NAME = paste("iso", LETTERS[6], 1:2), AFFILIATION = LETTERS[6])

el$iso7     <- tibble(NAME = paste("iso", LETTERS[3], 1:4), AFFILIATION = LETTERS[7])


el          <- bind_rows(el)
incidence   <- xtabs(~NAME + AFFILIATION, el, sparse = TRUE)

g           <- graph_from_incidence_matrix(incidence)
l.inc       <- minimal.members.decomposition(incidence, minimum.memberships = 2, check.for.nested = T)

map(l.inc, dim)
lev.mem     <- level.membership(l.inc, mode = "two-mode")

p           <- graph.plot(g, text = TRUE, vertex.fill = factor(lev.mem$Level), vertex.shape = factor(lev.mem$type) , edge.color = "black", text.vjust = -1, vertex.size = degree(g), edge.size = 0.2)
p + scale_shape_manual(values = c(21, 22)) + scale_size(range = c(2,6))

Minimal example 2 levels

el            <- list()
el$david      <- tibble(NAME = "Linker: David", AFFILIATION = LETTERS[1:3])
el$boris      <- tibble(NAME = "Linker: Boris", AFFILIATION = LETTERS[1:2])

el$iso1     <- tibble(NAME = paste("Isolate", LETTERS[1]), AFFILIATION = LETTERS[1])
el$iso2     <- tibble(NAME = paste("Isolate", LETTERS[2]), AFFILIATION = LETTERS[2])
el$iso3     <- tibble(NAME = paste("Isolate", LETTERS[3]), AFFILIATION = LETTERS[3])


el          <- bind_rows(el)
incidence   <- xtabs(~NAME + AFFILIATION, el, sparse = TRUE)

g           <- graph_from_incidence_matrix(incidence)
l.inc       <- minimal.members.decomposition(incidence, minimum.memberships = 2, check.for.nested = F)

map(l.inc, dim)
lev.mem     <- level.membership(l.inc, mode = "two-mode")

p           <- graph.plot(g, text = TRUE, vertex.fill = factor(lev.mem$Level), vertex.shape = factor(lev.mem$type) , edge.color = "black", text.vjust = -1, vertex.size = degree(g), edge.size = 0.2)
p + scale_shape_manual(values = c(21, 22)) + scale_size(range = c(2,6))

Minimal example 3 levels

el            <- list()
el$david      <- tibble(NAME = "Linker: David", AFFILIATION = LETTERS[1:3])
el$boris      <- tibble(NAME = "Linker: Boris", AFFILIATION = LETTERS[1:3])
el$tony       <- tibble(NAME = "Linker: Tony", AFFILIATION = LETTERS[1:2])

el$iso1     <- tibble(NAME = paste("Isolate", LETTERS[1]), AFFILIATION = LETTERS[1])
el$iso2     <- tibble(NAME = paste("Isolate", LETTERS[2]), AFFILIATION = LETTERS[2])
el$iso3     <- tibble(NAME = paste("Isolate", LETTERS[3]), AFFILIATION = LETTERS[3])


el          <- bind_rows(el)
incidence   <- xtabs(~NAME + AFFILIATION, el, sparse = TRUE)

g           <- graph_from_incidence_matrix(incidence)
l.inc       <- minimal.members.decomposition(incidence, minimum.memberships = 2, check.for.nested = F)

map(l.inc, dim)
lev.mem     <- level.membership(l.inc, mode = "two-mode")

p           <- graph.plot(g, text = TRUE, vertex.fill = factor(lev.mem$Level), vertex.shape = factor(lev.mem$type) , edge.color = "black", text.vjust = -1, vertex.size = degree(g), edge.size = 0.2)
p + scale_shape_manual(values = c(21, 22)) + scale_size(range = c(2,6))

DEN classic

data(pe13)
data(den)

den    <- den[den$SOURCE != "Events",]
kill   <- c("Arbejdsmiljørådet (Rådsmedlemmer)")
den    <- den %>% filter(!AFFILIATION %in% kill)
den    <- as.den(den)


plot.collapse <- function(l.inc){

  l.inc <- compact(l.inc)
  names(l.inc) <- seq_along(l.inc)  

  d  <- tibble(level = names(l.inc),
               rows = map(l.inc, nrow) %>% unlist(),
         cols = map(l.inc, ncol) %>% unlist()
            )

d    <- gather(d, key = "dim", "value", -level)  

p    <- ggplot(d, aes(x = as.factor(level), y = value, fill = dim)) + geom_col( position = "dodge", color = "black") 
p    <- p + theme_classic() + scale_fill_brewer(type = "qual", palette = "Pastel1", name = "Dimension")
p + xlab("Level") + ylab("Count") + ggtitle("Minimal membership collapse sequence")

#rs <- l.inc %>% map(colSums) %>% map_dbl(mean)

# ggplot(rs, aes(y = values, x = ind)) + geom_jitter() + geom_violin()
}



level.summary <- function(l.inc){
  l.inc  <- compact(l.inc)
  l.g    <- map(l.inc, ~graph_from_incidence_matrix(incidence = .x))
  l.cl   <- map(l.g, clusters)

map_dbl(l.cl, "no")  
l.g %>% map(~bipartite.projection(.x)[[1]]) %>% map(degree) %>% map_dbl(mean)

}

incidence   <- xtabs(~NAME + AFFILIATION, den, sparse = TRUE, drop.unused.levels = T)
incidence@x[] <- 1

incidence   <- incidence[order(rownames(incidence)), ] # Der er noget helt galt, hvor xtabs ikke sorterer rigtigt!
is.unsorted(rownames(incidence))
l.inc       <- minimal.members.decomposition(incidence, 3, check.for.nested = TRUE)

# Membership for affil mangler
lev.mem     <- level.membership(l.inc)

map(l.inc, dim)
l <- length(l.inc)

l.inc[[l]] %>% colSums() %>% sort() %>% as.matrix()
l.inc[[l - 1]] %>% colSums() %>% sort() %>% as.matrix()
l.inc[[l - 2]] %>% colSums() %>% sort() %>% as.matrix()
l.inc[[l - 3]] %>% colSums() %>% sort() %>% as.matrix()
l.inc[[l - 4]] %>% colSums() %>% sort() %>% as.matrix()

l.set <- l.inc[[l]]


incidence <- l.inc[[l - 4]]


l.set %>% rowSums() %>% as.matrix()


l.inc[[l]] %>% graph_from_incidence_matrix() %>% graph.plot.twomode(text = FALSE, edge.color = "grey50", edge.size = 0.5, vertex.fill = V(.)$type, vertex.size = 0)

l.inc[[l]] %>% graph_from_incidence_matrix() %>% graph.plot.twomode(text = TRUE, edge.color = "grey50", edge.size = 0.5, vertex.fill = V(.)$type)

i <- l.inc[[8]]

l.mem <- level.membership(l.inc)

g   <- l.inc[[1]] %>% graph_from_incidence_matrix() %>% bipartite_projection(which = "false")
all.equal(V(g)$name, l.mem$Name)

d <- tibble(level = l.mem$Level, degree = degree(g), closeness = rank(closeness(g)),
           between = rank(betweenness(g)), coreness = coreness(g))

dl <- map_df(d, rank) >= nrow(d) - 500
cor(dl)

# Fejl med haveselskabet
s <- c("Haveselskabet (bestyrelse)", "Haveselskabet (styrelsesråd)")

incidence <- l.inc[[8]]
l.i <- minimal.members.decomposition(l.inc[[8]], 2, check.for.nested = TRUE)
l.i[[1]][, s] %>% colSums() %>% sort(decreasing = TRUE)
den.corp <- den[den$SOURCE == "Corporations",]

incidence   <- xtabs(~NAME + AFFILIATION, droplevels(den.corp), sparse = TRUE)
l.inc       <- minimal.members.decomposition(incidence, 3)

l.inc[[1]] %>% graph_from_incidence_matrix() %>% graph.plot.twomode(text = TRUE, edge.color = "grey50", edge.size = 0.5, vertex.fill = V(.)$type)
load("~/Dropbox/Elite og Tal/Magtelite 2016/data/workspace_4.Rda")


#den.db    <- den.db[!den.db$AFFILIATION %in% den.events$AFFILIATION %>% unique(), ]

incidence   <- xtabs(~NAME + AFFILIATION, droplevels(den.db), sparse = TRUE)
incidence   <- incidence[, colSums(incidence) <= 100]
incidence   <- incidence[rowSums(incidence) > 0,]
incidence   <- incidence > 0
l.inc       <- minimal.members.decomposition(incidence, 3)

l.inc[[7]] %>% colSums() %>% sort() %>% as.matrix()
l.inc[[9]] %>% rowSums() %>% sort() %>% as.matrix()
# load("~/Dropbox/Granted/R/Granted_data/data/Clean data.Rda")
# 
# rcn <- soc.projects$RCN[soc.projects$eu.ssh]
# 
# incidence   <- xtabs(~ org_legalName + `Project RCN`, participants[participants$`Project RCN` %in% rcn,], sparse = TRUE)
# incidence   <- incidence[, colSums(incidence) <= 50]
# l.inc       <- minimal.members.decomposition(incidence, 3)
# 
# tail(l.inc, 10)[[6]] %>% rowSums() %>% sort() %>% as.matrix()



antongrau/eliter documentation built on March 2, 2024, 8:05 p.m.